The relative contribution of segments and intonation to the perception of foreign-accented speech
نویسندگان
چکیده
The present study examines the relative impact of segments and intonation on accentedness, comprehensibility, and intelligibility, specifically investigating the separate contribution of segmental and intonational information to perceived foreign accent in Korean-accented English. Two English speakers and two Korean speakers recorded 40 English sentences. The sentences were manipulated by combining segments from one speaker with intonation (fundamental frequency contour and duration) from another speaker. Four versions of each sentence were created: one English control (English segments and English intonation), one Korean control (Korean segments and Korean intonation), and two Korean–English combinations (one with English segments and Korean intonation; the other with Korean segments and English intonation). Forty native English speakers transcribed the sentences for intelligibility and rated their comprehensibility and accentedness. The data show that segments had a significant effect on accentedness, comprehensibility, and intelligibility, but intonation only had an effect on intelligibility. Contrary to previous studies, the present study, separating segments from intonation, suggests that segmental information contributes substantially more to the perception of foreign accentedness than intonation. Native speakers seem to rely mainly on segments when determining foreign accentedness. When adults learn to speak a second language, their nonnative speech is often identified as accented (Flege, Munro, & MacKay, 1995). Second language acquisition typically requires learning new motor patterns, creating new phoneme categories, and reorganizing existing phoneme boundaries. Adult learners’ difficulty in mastering any of these aspects leads to pronunciations that deviate from the native norm, resulting in speech that sounds accented and may also be difficult to understand (e.g., Anderson-Hsieh, Johnson, & Koehler, 1992; Edwards & Zampini, 2008; Munro & Derwing, 1995). These difficulties in second language pronunciation typically include both segmental and suprasegmental features (e.g., Edwards & Zampini, 2008; Major, 2001; van Els & de Bot, 1987). © Cambridge University Press 2014 0142-7164/14 Applied Psycholinguistics 37:2 304 Sereno et al.: Segments and intonation in accented speech The current study investigates the separate contribution of segmental and suprasegmental information, specifically intonation, to perceived foreign accent in Korean-accented English. Intonation refers to “the distinctive use of pitch over units larger than a single word” (Reetz & Jongman, 2009, p. 221). The goal is to determine the extent to which segmental and intonational information affect the perception of foreign-accented speech. Evaluation of second language speech can involve measures of intelligibility, comprehensibility, or accentedness (Munro & Derwing, 1999). Accentedness and comprehensibility are both measured through listeners’ judgments of speech samples, but intelligibility is an objective measure of how much of a sample was correctly understood. Intelligibility is broadly defined as the critical measure of the extent to which speech is understood by a listener (e.g., Munro & Derwing, 1995). While commonly used to assess the transmission of information in, for example, a noisy environment, intelligibility is gaining ground as one of the most useful measures to assess nonnative speech because making oneself understood in a foreign language is arguably the most important component of successful communication. Intelligibility measures range from orthographic transcription of sentences or key words to intelligibility ratings on a Likert scale (e.g., Fayer & Krasinski, 1987; Gass & Varonis, 1984). As pointed out by Munro and Derwing (1999), pronunciation experts have long emphasized improved intelligibility as the most important objective of pronunciation teaching. Comprehensibility is a measure of how difficult or easy an utterance is to understand, using a rating scale, without actually checking listeners’ accuracy. It is therefore a measure of native speakers’ perception of intelligibility (Derwing &Munro, 1997). Accentedness is a measure of how strong the foreign speaker’s accent is perceived to be (Munro & Derwing, 1995). Accentedness is also typically measured along a rating scale and allows native listeners to indicate the extent to which they feel a nonnative speaker’s utterance deviates from native-speaker norms. While intelligibility, comprehensibility, and accentedness are related, they are partially independent (e.g., Munro & Derwing, 1998). For example, studies have shown that moderately or highly accented utterances could nevertheless be highly intelligible and comprehensible (e.g., Derwing & Munro, 1997; Munro & Derwing, 1998). Such data emphasize the need for studies that assess all three components of nonnative speech production. Pronunciation errors can include both segmental and suprasegmental deviations. Segmental errors are errors in the production of individual consonants and vowels (Anderson-Hsieh et al., 1992; Flege & Hillenbrand, 1984). Segmental errors include the substitution of one sound for another or the modification of a sound (e.g., Broselow, Chen, & Wang, 1998; Flege, Bohn, & Jang, 1997; Major & Faudree, 1996). Suprasegmental errors include errors in stress assignment, intonation, timing, phrasing, and rhythm (Anderson-Hsieh et al., 1992; Koster & Koet, 1993; Munro, 1995). Suprasegmental errors can further be divided into those that influence speech fluency, including speech rate, pause frequency, and pause duration, and those that influence speech melody, including tonal peak alignment and stress timing (Trofimovich & Baker, 2006). The impact of these errors on accentedness and comprehensibility has been examined in several studies (Anderson-Hsieh et al. 1992; Anderson-Hsieh & Koehler, 1988; Applied Psycholinguistics 37:2 305 Sereno et al.: Segments and intonation in accented speech Derwing,Munro, &Wiebe, 1998;Munro&Derwing, 1999; Trofimovich&Baker, 2006). In an early study, Anderson-Hsieh et al. (1992) studied the relationship between listeners’ judgments of foreign-accented speech and the types of errors present in the speech. Native English listeners rated speech samples of nonnative speakers from a great variety of languages (Arabic, Armenian, Assamese, Chinese, Farsi, German, Greek, Hindi, Indonesian, Kannada, Korean,Malayalam, Punjabi, SerboCroatian, Spanish, and Tamil) and a variety of proficiency levels in terms of their English pronunciation. After auditory analysis of the speech samples for phoneme, syllable, and prosody errors, the correlation between these errors and the native listener ratings was determined. The researchers found that, although both segmental errors and prosody errors correlated with pronunciation rating, the prosody rating was more highly correlated with pronunciation ratings (AndersonHsieh et al., 1992). However, the scale used to judge overall pronunciation in the reported study did not distinguish between accentedness and comprehensibility, and instead combined the two. Furthermore, because the speech samples contained both prosodic and phonemic errors, it is not clear to what extent the pronunciation ratings were really able to separately assess the two types of information. Munro and Derwing (1999) conducted a study with similar methods, but they specifically looked at the relationships among accentedness ratings, comprehensibility ratings, and intelligibility. Although the role of different types of errors in nonnative speech in these ratings was not the main focus of the study, it was statistically examined. In this study, native English listeners transcribed and rated accentedness and comprehensibility of sentences spoken in English by native Mandarin speakers. Later, linguists determined the number of phonemic errors and rated the intonation of each speech sample. In the analysis, the correlation between each of these error scores and accentedness, comprehensibility, and intelligibility was determined. The most relevant finding was that nonnative intonation correlated more with accentedness and comprehensibility than did the nonnative phonemic errors (Munro & Derwing, 1999). However, once again, it is difficult to separate segmental and prosodic contributions completely because both were present in the speech sample and may have affected each other in the ratings. A study by Derwing et al. (1998) on the efficacy of different methods of foreign-language instruction revealed that segments and prosodymay have varying impacts on accentedness and comprehensibility, depending on the type of speech sample under investigation. In this study, adult nonnative speakers in English as a second language classes were placed in one of three training groups: no specific training, training on segmental features, or training on global features such as speaking rate, intonation, rhythm, and stress (all suprasegmental features). Both before and after 12 weeks of training, native speakers listened to two types of speech samples from the speakers: simple sentences that were read aloud and an extemporaneous description of a picture story. For the single-sentence samples, comprehensibility improved for both groups who had training (either segmental or global training). However, accentedness improved much more for those who had segmental training than for those who had global training. A different pattern emerged for the extemporaneous speech samples, however. Accentedness ratings did not improve for any of the groups, and comprehensibility improved only for Applied Psycholinguistics 37:2 306 Sereno et al.: Segments and intonation in accented speech those who had global training (Derwing et al., 1998). This study did separate segmental from suprasegmental features in training, finding that the nature of the training regime (segmental or suprasegmental) differentially affected ratings. The issue of separately examining segmental and suprasegmental information was addressed by Trofimovich and Baker (2006) in a study that focused on effects of second language (L2) experience on suprasegmentals. In this study, the researchers recorded simple English sentences produced by native Korean speakers and removed segmental information through low-pass filtering. These sentences were then presented to English listeners, who rated them for accentedness. In addition, the researchers analyzed fluency factors (pause duration, pause frequency, and speech rate) and speech melody factors (tonal peak alignment and stress timing). This study found that even the nonnative speech of residents of over 10 years still sounded accented at the suprasegmental level, showing that suprasegmentals do carry some aspect of accent. Further, the factors that were most predictive of accentedness ratings were pause duration and speech rate (factors involved in speech fluency; Trofimovich & Baker, 2006). Although this study did separately examine suprasegmentals by eliminating segmental information from the presented utterances, only accentedness ratings were collected. Most of the studies investigating accented speech productions have examined only accentedness or combined accentedness and comprehensibility. However, accentedness may not always overlap with comprehensibility. Several studies have shown that, although accentedness, comprehensibility, and intelligibility correlate, it is not always the case that a highly accented speaker has low intelligibility or comprehensibility (Derwing & Munro, 1997; Kashiwagi & Snyder, 2010; Munro & Derwing, 1995, 1999). In two studies by Munro and Derwing (1995, 1999), accentedness was often rated much harsher than comprehensibility, and for fully (100%) intelligible samples, accentedness ratings varied greatly (see also Kashiwagi & Snyder, 2010). Furthermore, in Derwing and Munro (1995), accentedness was not related to response times, showing that accentedness does not necessarily make the sentence less comprehensible. It is clear that the relation of segmentals and suprasegmentals to accentedness, comprehensibility, and intelligibility is something that needs to be further examined. A few studies have employed signal manipulation to assess the role of prosody in the perception of a foreign accent. In an early study on Dutch-accented English, Willems (1982) collected acceptability ratings for three sets of synthesized sentences: native British English (BE), native BE with one fundamental frequency (F0) deviation, and Dutch-accented BE. Willems found that the direction and magnitude of F0 movement as well as the production of a rise at the end of wh-questions most strongly affected acceptability judgments. Jilka (2000) specifically focused on the role of intonation. He replicated and expanded Willems’s approach with a focus on German and American English. His comparison of foreign accent ratings for original foreign-accented utterances and versions that were systematically manipulated on the basis of the category-oriented Tones and Break Indices framework of intonation description (e.g., Beckman, Hirschberg, & Shattuck-Hufnagel, 2005), for example, showed that the manipulated utterances were judged as having less of an accent. In a further extension, Jilka (2000) found that listeners were worse at deciding which language they heard in a monotonous Applied Psycholinguistics 37:2 307 Sereno et al.: Segments and intonation in accented speech low-pass filtered condition, leading him to conclude that intonation per se is of crucial importance to the perception of a foreign accent. To directly assess the contribution of intonation to the perception of foreign accentedness, Jilka (2000) compared accentedness ratings for pairs of sentences consisting of a foreign-accented utterance and the same utterance in which the intonation had been replaced by a correct, unaccented intonation pattern by means of rule-based generation and resynthesis. In addition, fully synthesized utterances were included that retained the original foreign-accented intonation contour but in which any segmental foreign accent was removed by using concatenative diphone synthesis. Thus, this evaluation involved a comparison between fully synthesized and original/resynthesized utterance pairs. Comparison of foreign accent ratings of the “natural” rule-generated version with those of the synthesized version of the same utterance showed that the synthesized stimuli were generally rated as less accented. Given these data, Jilka (2000) concluded that segmental information contributes more strongly to perceived foreign accent than intonation. Winters and O’Brien (2013) recently investigated the relative contributions of F0 and duration to perceived accentedness and intelligibility, also examining typologically similar languages. In this study, English and German speakers produced both English and German sentences (both groups were highly proficient in their L2), and these utterances were evaluated by English monolingual listeners as well as English and German proficient L2 listeners. Native intonation contours and syllable durations or only syllable durations were combined with native or nonnative segmental productions. Most important for the present purpose are the conditions in which native English segments were combined with German-accented prosody (intonation and duration) and in which German-accented segments were combined with English prosody (intonation and duration). While the overall results showed that nonnative segments affected both accentedness and intelligibility, accentedness and intelligibility ratings diverged for nonnative segments with native prosody. It should be noted that sentences were presented in the clear for accentedness ratings but in pink noise for intelligibility ratings. Winters and O’Brien found that the combination of German-accented segments with English prosody reduced the perceived accentedness of the nonnative speech, suggesting that segments may contribute more to perceived foreign accent than prosody. However, for intelligibility, German-accented segments were less intelligible when combined with English prosody. Although Winters and O’Brien’s overall conclusion is that segmental cues do seem to improve perceived accentedness in nonnative speech more than prosodic cues do, nonnative prosody does seem to play a role by reducing intelligibility. Holm (2009) investigated the contribution of intonation and duration to the perception of foreign-accented Norwegian spoken by native speakers from seven different languages (English, French, German, Mandarin, Persian, Russian, and Tamil). In one condition, the duration of each segment in the L2 Norwegian productions was changed to that in native Norwegian productions. In a second condition, the native Norwegian intonation contour was simplified (“stylized”) and copied onto the corresponding L2 Norwegian production. In a third condition, both duration and intonation were manipulated by copying the stylized native Norwegian intonation contour onto L2 Norwegian productions with native Norwegian Applied Psycholinguistics 37:2 308 Sereno et al.: Segments and intonation in accented speech durations. Listeners had to indicate whether one sentence of the pair was far less accented, less accented, or equally accented relative to the other sentence. In general, results showed that the combined manipulation of duration and intonation significantly reduced the degree of perceived foreign accent for all native speakers, although certain native languages (e.g., Tamil and Mandarin) improved more than others (e.g., Russian and German). Holm also collected intelligibility ratings for different manipulations of the same sentence presented in pink noise. The results showed that the intonation manipulation improved intelligibility for English, German, Tamil, and Russian, while duration was most important for French. For Chinese and Persian, there was no difference among the intelligibility of the manipulations. However, it is somewhat difficult to interpret these intelligibility results and relate them to the accentedness results. Because the original accented nonmanipulated sentences were significantly more intelligible than the manipulated ones, they could not serve as a baseline condition. To remedy this, “close original” duration and intonation stimuli were created. Consequently, given these additions, the accentedness ratings and intelligibility ratings were not collected for the same set of manipulations. Tajima, Port, and Dalby (1997) investigated the importance of temporal properties for intelligibility. They used linear predictive coding resynthesis and dynamic time warping to change the temporal properties of Chinese-accented English to match those of native English speech while leaving the original formant frequencies and F0 intact. Intelligibility was measured in a forced identification task with the target and three distracter sentences. Results showed a significant improvement in intelligibility for sentences withmodified temporal properties as compared to the original Chinese-accented utterances. In addition, manipulation in the opposite direction, changing native English temporal attributes to match those of Chinese-accented English, significantly reduced intelligibility. Quené and Van Delft (2010) recently replicated this latter finding in a sophisticated design. Using native Dutch and Polish-accented Dutch, segmental durations were swapped between the two versions while differences in speaking rate and intonation were removed. Pitch synchronous overlap and add (Moulines & Charpentier, 1990) was used to adjust the segment durations of one sentence version to match those of the other version, and vice versa. Using the speech reception threshold method (Plomp & Mimpen, 1979), sentences were presented in masking noise that matched the speech signal. Dutch listeners had to repeat each sentence out loud, and the signal to noise ratio was decreased or increased depending on whether their response was correct or incorrect, respectively, until accuracy had reached 50%. The results showed that Polish-accented sentences with Dutch segmental durations were slightly, but significantly, more intelligible than the original Polish-accented sentences. In a replication of Tajima et al. (1997), Quené and Van Delft (2010) also found that the original Dutch sentences were more intelligible than Dutch sentences with segment durations from the Polish-accented versions. The latter manipulation, native Dutch with Polishaccented durations, affected intelligibility more than the former, Polish-accented Dutch with native Dutch durations. This led Quené and Van Delft (2010) to conclude that hearing nativelike speech with inappropriate durational information may be relatively worse than hearing nonnativelike speech with appropriate Applied Psycholinguistics 37:2 309 Sereno et al.: Segments and intonation in accented speech durational information. This is similar to Winter and O’Brien’s (2013) finding for resynthesized German-accented English productions that intelligibility decreases when nonnative segments are combined with native prosody as compared to nonnative prosody. It is important to note that, while for Quené and Van Delft (2010) native durations improved intelligibility, their contribution was relatively small compared to the contribution of segmental information. That is, a much larger difference in intelligibility was observed between the original Dutch sentences and the Polish-accented sentences with Dutch durations. Because speaking rate and intonation were controlled for, this difference must be due to segmental deviations. Previous research has established that nonnative suprasegmentals give rise to the perception of a foreign accent and that nonnative intonation and timing each contribute to this perception. While the extent to which different suprasegmental features contribute to a foreign accent depends on both the L2 learners’ native language (Holm, 2009) and on the target language, there is little agreement on the relative contribution of segments and suprasegmentals to perceived foreign accent. While some studies suggest that suprasegmental aspects are more important (e.g., Anderson-Hsieh et al. 1992; Anderson-Hsieh & Koehler, 1988; Magen, 1998; Munro & Derwing, 1999), others suggest a primary role for segments (e.g., Jilka, 2000; Quené &VanDelft, 2010). The results of previous studies are not conclusive for a variety of reasons. In previous studies, listenerswere asked to judge one aspect of nonnative speech while ignoring another, making it difficult to fully tease apart the individual contribution of segmental and suprasegmental information. For example, in Anderson-Hsieh et al. (1992), judges rated accented speech samples in terms of both segmental and suprasegmental errors; because both segments and suprasegmentals are simultaneously present in the samples, segmental errors may have affected suprasegmental ratings and vice versa. Moreover, Jilka’s (2000) conclusion that segmental information contributes more strongly to perceived foreign accent than intonation was based on a comparison between fully synthetic and natural resynthesized sentences, which introduces naturalness of the materials as a potential confounding factor as well. Finally, because the focus of Quené and Van Delft (2010) was on nonnative durational patterns, their results can only indirectly suggest that segmental errors play a more significant role in foreign accentedness. In the present study, we will attempt to separate segmental features from suprasegmental features and evaluate the contribution of each to perceived foreign accent. At the suprasegmental level, we will focus on intonation, which has been shown to play a role in the perception of foreign accent (e.g., Holm, 2009; Jilka, 2000; Willems, 1982). Specifically, using digital signal processing techniques, we will separate the intonation from the segmental content for both native and nonnative speech samples and superimpose the intonation of one onto the segmental content of the other. Furthermore, in the present study, we will examine accentedness, comprehensibility, and intelligibility separately to determine exactly how segmental and suprasegmental errors impact all three evaluation methods. We focus on Korean-accented English (Flege, 1999; Flege et al., 2006) because there are both substantial prosodic and segmental differences between Korean and English. While English is typically described as a stress-timed language, the rhythm Applied Psycholinguistics 37:2 310 Sereno et al.: Segments and intonation in accented speech structure of Korean seems to fall in between that of syllable-timed and mora-timed languages (Arvaniti, 2012; Tark, 2012). The prosody of Korean has also been analyzed in detail (e.g., Jun, 1996) and the intonation of Korean-accented English has been documented in several studies (e.g., Jun, 1998; Trofimovich & Baker, 2006). For example, in their acoustic analysis of F0 peak alignment in Korean-accented English, Trofimovich and Baker (2006) report that the maximum F0 value occurs significantly later in the stressed syllable than in native English. This difference is presumably because while the pitch peak in English is usually aligned with the onset of the stressed syllable (Ladd, Mennen, & Schepman, 2000), the pitch peak in Korean is typically aligned with the offset of the stressed syllable (Jun, 1998; Lee, 2013). In their study of Korean-accented declarative English sentences, Kim and Kim (2001) also found that the pitch accent typically fell on the last syllable of the phonological word in focus. In addition, they report that Korean speakers exhibit distinct tonal patterns for phrases (e.g., the low-high-low-high tone sequence of Korean accentual phrases) that sometimes gives the impression that they are asking a question rather than making a statement in English. Kim and Kim conclude that these two instances of Korean-to-English transfer result in intonation patterns in Korean-accented English that are clearly distinct from those of natively produced English. In terms of segments, there are a number of differences between Korean and English that may provide difficulty for Korean learners of English. Korean has alveolar and glottal but no interdental or postalveolar fricatives. Korean has no labial-velar or alveolar approximants. As for vowels, Korean lacks the lax counterparts of /i/ and /u/. It also lacks /æ/ (Lee, 1999). Finally, the syllable structure of English is more complex than that of Korean. For example, Korean does not allow syllable-initial consonant clusters and Korean learners of English may insert a vowel to break up a consonant cluster (e.g., Eckman & Iverson, 1993; Tarone, 1980). Although there is no uniform agreement about the relative contributions of segments and intonation, the bulk of the evidence leads to the expectation that intonation errors will be the factor that most influences ratings of both comprehensibility and accentedness, with sentences that have native segments and nonnative intonation rated as less comprehensible and more accented than the sentences that have nonnative segments and native intonation. We also hypothesize based on some previous literature that nonnative intonation may also reduce intelligibility. This overall pattern of data would suggest that intonation errors contribute more to comprehensibility, accentedness, and intelligibility than do segmental errors.
منابع مشابه
Speech motor brain regions are differentially recruited during perception of native and foreign-accented phonemes for first and second language listeners
Brain imaging studies indicate that speech motor areas are recruited for auditory speech perception, especially when intelligibility is low due to environmental noise or when speech is accented. The purpose of the present study was to determine the relative contribution of brain regions to the processing of speech containing phonetic categories from one's own language, speech with accented samp...
متن کاملThe Role of Spectral Resolution in Foreign-Accented Speech Perception
Several studies have shown that diminished spectral resolution leads to poorer speech recognition in adverse listening conditions such as competing background noise or in cochlear implants. Although intelligibility is also reduced when the talker has a foreign accent, it is unknown how limited spectral resolution interacts with foreign-accent perception. It is hypothesized that limited spectral...
متن کاملProcessing changes when listening to foreign-accented speech
This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extractio...
متن کاملThe neural processing of foreign-accented speech and its relationship to listener bias
Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners' percept...
متن کاملEeects of Temporal Correction on Intelligibility of Foreign-accented English
This study investigates the contribution of the temporal patterning of speech to the reduced intelligibility of foreign-accented utterances. Short English phrases spoken by a native Chinese speaker were instrumentally modiied, using LPC resynthesis and dynamic time warping, so as to align the duration of acoustic segments with tokens of the same phrases spoken by a native English speaker, while...
متن کامل